-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-36795: [C#] Implement support for dense and sparse unions #36797
Conversation
|
{ | ||
UnionMode.Sparse => "+us:", | ||
UnionMode.Dense => "+ud:", | ||
_ => throw new InvalidDataException($"Unsupported time unit for export: {unionType.Mode}"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ => throw new InvalidDataException($"Unsupported time unit for export: {unionType.Mode}"), | |
_ => throw new InvalidDataException($"Unsupported union mode for export: {unionType.Mode}"), |
: base(data) | ||
{ | ||
ValidateMode(UnionMode.Dense, Type.Mode); | ||
data.EnsureBufferCount(2); // TODO: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe link this TODO to a new issue? (and what is the TODO about, given SparseUnionArray lacks one?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heck if I can remember... . Will just remove the TODO.
Hmm, it seems C# producing unions fails in the integration test, though |
I thought this had worked before, but it clearly can't have. There may be more than one problem, but the obvious one is the exception thrown when comparing the data generated from the JSON template with the checked-in data. The template for union test data contains duplicate names -- two fields named "struct" and two fields named "dense". One of each field is marked as nullable in the JSON and one as non-nullable. But the data file that's been checked-in shows all four fields as being non-nullable. I'm not sure what to make of this. I don't see code in the Java or C++ implementations which special-case anything about nullability -- either in reading from the JSON template or in the implementation classes themselves. |
Ah...right before 1.0 unions were changed: they no longer have a top-level nullability. I guess the files may not have been changed to reflect this. |
I still don't understand how this is working for the other implementations. I've looked at the code for Field and for the Field comparers and for the JSON import and there's no special-case which says "ignore the nullability bit on Field if this is a Union". I'll have to come back to it later. |
It looks like some changes made to Archery in the last week have fixed the errors we were getting. I'm ... not entirely comfortable with that, I guess, but I spent a lot of time trying to understand if/what the C# implementation was doing differently without any success. Edit: nope, looks like one problem solved, one still remains. |
Likely b957847? Which does now explicitly generate union fields with/without nullability to check what happens. |
The Java error makes it sound like C# is still writing with metadata version 4: Lines 38 to 48 in fa43106
That may also explain things, if it's triggering some compatibility path, e.g.: arrow/cpp/src/arrow/ipc/reader.cc Lines 391 to 413 in fa43106
|
Thanks @lidavidm for your help; couldn't have made it work without you! |
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit e55f912. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
…pache#36797) ### What changes are included in this PR? Support dense and sparse unions in the C# implementation. Adds Archery support for C# unions. ### Are these changes tested? Yes ### Are there any user-facing changes? Unions are now supported in the C# implementation. **This PR includes breaking changes to public APIs.** The public APIs for the UnionArray and UnionType were changed fairly substantially. As these were previously not implemented properly, the impact of the changes ought to be minimal. The ChunkedArray and Column classes were changed to hold IArrowArrays instead of Arrays. To accomodate this, a constructor was added which may introduce ambiguity in calling code. This could be avoided by changing the overloaded constructor to instead be a factory method. This didn't seem worthwhile but could be reconsidered. The metadata version was finally increased to V5. * Closes: apache#36795 Authored-by: Curt Hagenlocher <[email protected]> Signed-off-by: David Li <[email protected]>
…pache#36797) ### What changes are included in this PR? Support dense and sparse unions in the C# implementation. Adds Archery support for C# unions. ### Are these changes tested? Yes ### Are there any user-facing changes? Unions are now supported in the C# implementation. **This PR includes breaking changes to public APIs.** The public APIs for the UnionArray and UnionType were changed fairly substantially. As these were previously not implemented properly, the impact of the changes ought to be minimal. The ChunkedArray and Column classes were changed to hold IArrowArrays instead of Arrays. To accomodate this, a constructor was added which may introduce ambiguity in calling code. This could be avoided by changing the overloaded constructor to instead be a factory method. This didn't seem worthwhile but could be reconsidered. The metadata version was finally increased to V5. * Closes: apache#36795 Authored-by: Curt Hagenlocher <[email protected]> Signed-off-by: David Li <[email protected]>
…pache#36797) ### What changes are included in this PR? Support dense and sparse unions in the C# implementation. Adds Archery support for C# unions. ### Are these changes tested? Yes ### Are there any user-facing changes? Unions are now supported in the C# implementation. **This PR includes breaking changes to public APIs.** The public APIs for the UnionArray and UnionType were changed fairly substantially. As these were previously not implemented properly, the impact of the changes ought to be minimal. The ChunkedArray and Column classes were changed to hold IArrowArrays instead of Arrays. To accomodate this, a constructor was added which may introduce ambiguity in calling code. This could be avoided by changing the overloaded constructor to instead be a factory method. This didn't seem worthwhile but could be reconsidered. The metadata version was finally increased to V5. * Closes: apache#36795 Authored-by: Curt Hagenlocher <[email protected]> Signed-off-by: David Li <[email protected]>
…pache#36797) ### What changes are included in this PR? Support dense and sparse unions in the C# implementation. Adds Archery support for C# unions. ### Are these changes tested? Yes ### Are there any user-facing changes? Unions are now supported in the C# implementation. **This PR includes breaking changes to public APIs.** The public APIs for the UnionArray and UnionType were changed fairly substantially. As these were previously not implemented properly, the impact of the changes ought to be minimal. The ChunkedArray and Column classes were changed to hold IArrowArrays instead of Arrays. To accomodate this, a constructor was added which may introduce ambiguity in calling code. This could be avoided by changing the overloaded constructor to instead be a factory method. This didn't seem worthwhile but could be reconsidered. The metadata version was finally increased to V5. * Closes: apache#36795 Authored-by: Curt Hagenlocher <[email protected]> Signed-off-by: David Li <[email protected]>
What changes are included in this PR?
Support dense and sparse unions in the C# implementation.
Adds Archery support for C# unions.
Are these changes tested?
Yes
Are there any user-facing changes?
Unions are now supported in the C# implementation.
This PR includes breaking changes to public APIs.
The public APIs for the UnionArray and UnionType were changed fairly substantially. As these were previously not implemented properly, the impact of the changes ought to be minimal.
The ChunkedArray and Column classes were changed to hold IArrowArrays instead of Arrays. To accomodate this, a constructor was added which may introduce ambiguity in calling code. This could be avoided by changing the overloaded constructor to instead be a factory method. This didn't seem worthwhile but could be reconsidered.
The metadata version was finally increased to V5.